Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expose optional input parameter disk_size for kraken2 standalone wfs #316

Merged
merged 7 commits into from
Feb 8, 2024

Conversation

kapsakcj
Copy link
Contributor

@kapsakcj kapsakcj commented Jan 23, 2024

This PR closes #315

πŸ—‘οΈ This dev branch should be deleted after merging to main.

🧠 Aim, Context and Functionality

[copied from GH issue]

disk_size is not exposed as an optional input parameter in the kraken2_PE_PHB and kraken2_SE_PHB workflows. A user ran into disk space issues/errors using this workflow with large databases and has no way of requesting a VM with a larger disk.

πŸ› οΈ Impacted Workflows/Tasks & Changes Being Made

This will affect the behavior of the workflow(s) even if users don’t change any workflow inputs relative to the last version: No

Running this workflow on different occasions could result in different results, e.g. due to use of a live database, "latest" docker image, or stochastic data processing : No

This code change affects both standalone kraken2 workflows:

  • kraken2_PE_PHB
  • kraken2_SE_PHB
  • TheiaMeta_Illumina_PE_PHB (I missed this previously. Didn't realize TheiaMeta calls this task!)

πŸ“‹ Workflow/Task Step Changes

πŸ”„ Data Processing

Nothing has changed about the way the workflow runs

Docker/software or software versions changed: No

Databases or database versions changed: No

Data processing/commands changed: No

File processing changed: No

Compute resources changed: Yes, user can now alter disk_size from the default of 100 GB

➑️ Inputs

None

⬅️ Outputs

None

πŸ§ͺ Testing

Test Dataset

tested on one ILMN PE bacterial sample (see link to test Terra workflow below), and requested 500 as the disk_size for the task, and Terra/cromwell did use a VM with a disk size of 500 GB πŸ‘

Commandline Testing with MiniWDL or Cromwell (optional)

Unnecessary, only test in Terra.

Terra Testing

successful workflow here: https://app.terra.bio/#workspaces/theiagen-validations/curtis-sandbox-theiagen-validations/job_history/dccf8cf5-ab64-4df6-bad8-9cbdd9fd718f

proof that the VM used a 500GB disk:
image

Suggested Scenarios for Reviewer to Test

Would be good to test out with large kraken2 databases. My test uses the "kraken2 standard" database which is 38GB compressed and estimated 70-ish GB when uncompressed.

Theiagen Version Release Testing (optional)

  • Will changes require functional or validation testing (checking outputs etc) during the release? functional testing
  • Do new samples need to be added to validation datasets? If so, upload these to the appropriate validation workspace Google bucket (). Please describe the new samples here and why these have been chosen. No
  • Are there any output files that should be checked after running the version release testing? No

πŸ”¬ Final Developer Checklist

  • The workflow/task has been tested locally and results, including file contents, are as anticipated
  • The workflow/task has been tested on Terra and results, including file contents, are as anticipated
  • The CI/CD has been adjusted and tests are passing (to be completed by Theiagen developer)
  • Code changes follow the style guide

🎯 Reviewer Checklist

  • All impacted workflows/tasks have been tested on Terra with a different dataset than used for development
  • All reviewer-suggested scenarios have been tested and any additional
  • All changed results have been confirmed to be accurate
  • All workflows/tasks impacted by change/s have been tested using a standard validation dataset to ensure no unintended change of functionality
  • All code adheres to the style guide
  • MD5 sums have been updated
  • The PR author has addressed all comments

πŸ—‚οΈ Associated Documentation (to be completed by Theiagen developer)

  • Relevant documentation on the Public Health Resources "PHB Main" has been updated
  • Workflow diagrams have been updated to reflect changes

…. affects both kraken2_PE_PHB and kraken2_SE_PHB workflows
@kapsakcj kapsakcj changed the title expose optional input parameter disk_size for kraken2 standalone task expose optional input parameter disk_size for kraken2 standalone wfs Jan 23, 2024
@kapsakcj
Copy link
Contributor Author

Reminder: tag Ines when ready for review

@kapsakcj kapsakcj requested a review from cimendes February 6, 2024 15:40
@kapsakcj
Copy link
Contributor Author

kapsakcj commented Feb 7, 2024

I ran one final test of kraken2_SE_PHB workflow after making the last commit to update the CI: https://app.terra.bio/#workspaces/theiagen-validations/curtis-sandbox-theiagen-validations/job_history/46106d91-fe26-4a13-8456-ddb27ca6743a

It allocated a VM with 500GB as I requested, so it ran as expected βœ…

@cimendes Would be great if you could test TheiaMeta_Illumina_PE (when you have time of course, this is lower priority)

@kapsakcj kapsakcj marked this pull request as ready for review February 7, 2024 14:52
@kapsakcj
Copy link
Contributor Author

kapsakcj commented Feb 7, 2024

Just realized that this change also impacts the readQC subworkflows as well...

workflows/utilities/wf_read_QC_trim_pe.wdl
workflows/utilities/wf_read_QC_trim_se.wdl:

both of these subworkflows call the kraken2_standalone task located in tasks/taxon_id/task_kraken2.wdl

so I'll run at least one theiaprok_illumina_PE_PHB workflow with the call_kraken boolean enabled

@kapsakcj
Copy link
Contributor Author

kapsakcj commented Feb 7, 2024

and just realized that I have to add this option to the read_QC subworkflows as well....

commit incoming. Maybe hold off on the review for a moment

@kapsakcj kapsakcj marked this pull request as draft February 7, 2024 15:15
@kapsakcj
Copy link
Contributor Author

kapsakcj commented Feb 7, 2024

OK here's a theiaprok_illumina_pe_PHB test where I set the new optional inputs (for read_QC_pe subworkflow) kraken_disk_size to 500 and kraken_memory to 64: https://app.terra.bio/#workspaces/theiagen-validations/curtis-sandbox-theiagen-validations/workflows/theiagen-validations/TheiaProk_Illumina_PE_PHB

EDIT: ran as expected and allocated a VM with 500GB disk and 64GB memory βœ…

@kapsakcj
Copy link
Contributor Author

kapsakcj commented Feb 7, 2024

and here's a test with TheiaProk_Illumina_SE_PHB with kraken_disk_size set to 250 and kraken_memory set to 64: https://app.terra.bio/#workspaces/theiagen-validations/curtis-sandbox-theiagen-validations/job_history/3ed3bae5-ba78-47d1-b75b-1c99131fdeda

EDIT: ran as expected and allocated a VM with 250GB disk and 64GB memory βœ…

@kapsakcj kapsakcj marked this pull request as ready for review February 7, 2024 15:42
@cimendes
Copy link
Member

cimendes commented Feb 8, 2024

TheiaMeta_Illumina_PE:

New input variables are present as expected:
image

βœ… Run test successful: https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Mendes_Sandbox/job_history/24bbace4-3074-4c66-883c-aa2fba93bfaf

@cimendes
Copy link
Member

cimendes commented Feb 8, 2024

Kraken2_PE:

New input parameter present as expected:
image

βœ… Successful run: https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Mendes_Sandbox/job_history/785c725e-8d55-4327-9bc8-a324b5ec2ad5

@cimendes
Copy link
Member

cimendes commented Feb 8, 2024

TheiaProk_Illumina_PE:

Inputs present as expected (pity WDL is depressing accessing these optional inputs directly through the Kraken2 task.. workflows in workflows 🀷🏻)
image

βœ… Successful run: https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Mendes_Sandbox/job_history/3968fa05-d2f6-4693-b7df-4833a095ff79

@cimendes
Copy link
Member

cimendes commented Feb 8, 2024

Last but not least, documentation!!

  • All changes are present on the PHB main documentation page 🌟
    • βœ… Added TheiaProk
    • βœ… Added TheiaMeta

@cimendes
Copy link
Member

cimendes commented Feb 8, 2024

All is looking good! Great job! I'm approving the PR!

Copy link
Member

@cimendes cimendes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

@cimendes cimendes merged commit 79dfcaf into main Feb 8, 2024
14 checks passed
@cimendes cimendes deleted the cjk-kraken-standalone-disk-size branch February 8, 2024 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feature request] expose disk_size optional input for kraken workflows
2 participants